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ABSTRACT 

The  present  paper  summarizes  a  research  project  focusing  on  ways  to  improve  the  usefulness  of  organization 
level  outcome  measures  of  unit  readiness/effectiveness.  Toward  this  goal,  a  measurement  approach  using  unit 
level  outcome  measures  is  presented.  The  measurement  system  presented  adapts  and  extends  the  performance 
distribution  assessment  approach  proposed  by  Kane  (1986;  1992).  We  demonstrate  that,  while  originally  used 
with  subjective  performance  judgments,  the  system  is  readily  adapted  to  regularly  collected  unit  level  outcomes, 

A  vital  concern  for  the  Air  Force  is  the  maintenance  of  mission  capability  and  readiness.  A  crucial  mechanism 
for  the  maintenance  of  mission  readiness  is  personnel  training.  Of  tremendous  importance  to  the  design, 
implementation,  and  revision  of  training  throughout  the  Air  Force,  as  with  any  organization,  is  the  ability  to 
evaluate  the  effectiveness  of  training  interventions.  Specifically,  the  effective  evaluation  of  any  training 
intervention  is  crucial  to  informed  decision  making  regarding  the  intervention.  Central  to  effective  training 
evaluation  is  the  standard  or  criteria  against  which  the  training  is  evaluated.  In  addition,  the  comprehensive 
evaluation  of  training  interventions  mandates  the  use  of  multiple  criterion  measures. 


Organization  Level  Criterion  Measures 

Organization  level  outcome  measures  represent  global  indices  of  effectiveness.  While  many  commonly  used 
criterion  measures  focus  on  the  assessment  of  individual  effectiveness,  organization  level  measures  most  often 
provide  more  aggregate  measures  of  effectiveness.  Recent  theories  of  the  criterion  construct,  have  begun  to 
recognize  the  inextricable  relationship  between  job  behaviors  and  outcomes.  Along  these  lines  Binning  and 
Barrett  (1989)  argue:  "...  optimal  description  of  the  performance  domain  for  a  given  job  requires  careful  and 
complete  delineation  of  valued  outcomes  and  the  accompanying  requisite  behaviors"  (p.486). 

Problems  with  Outcome-Based  Criterion  Measures 

The  detailed  delineation  of  the  relationship  between  job  performance  and  outcomes  is  especially  relevant  to 
training  evaluation.  An  important  direction  for  future  research  is  a  focus  on  behavior/outcome  linkages  and 
generating  empirical  support  for  these  linkages.  Unfortunately,  the  operationalization  of  specific  outcome 
measures  generates  somewhat  of  a  dilemma  for  training  evaluation.  On  the  one  hand,  the  ultimate  value  of 
training  lies  in  its  ability  to  impact  outcomes  of  value  to  the  organization.  Outcome  measures  (e.g.,  productivity 
levels,  turnover  rates,  error  rates,  etc.)  at  both  individual  and  aggregate  levels  would  appear  to  be  the  ultimate 
criterion  of  interest  for  evaluating  training  interventions.  On  the  other  hand,  these  measures  suffer  from  a  number 
of  problems  that  limit  their  usefulness  as  a  standard  against  which  to  judge  the  impact  of  training. 


First  and  foremost  among  these  problems  is  the  fact  that  these  measures  are  typically  contaminated  to  an 
undetermined  extent  by  sources  of  variance  over  which  the  individual  has  no  control.  Specifically,  the  measured 
outcome  is  to  some  extent  determined  by  factors  other  than  individual  performance.  A  second  problem  with 
outcome  measures  is  that  they  are  not  based  on  a  common  metric.  Outcome  measures  are  often  unique  to 
particular  units  within  an  organization  and  thus  are  difficult  to  interpret  and  compare  across  organizational  work 
groups  or  divisions.  Additionally,  the  lack  of  a  common  metric  typically  precludes  the  meaningful  aggregation  of 
performance  information  across  organizational  units.  A  third  problem  is  that  these  measures  only  provide  an 
indication  of  outcome  as  opposed  to  the  process  underlying  the  outcome.  Thus  these  measures  provide  little,  if 
any,  information  about  the  nature  of  performance.  Finally,  the  traditional  use  of  outcome  measures  offers  little,  if 
any,  means  of  assessing  measurement  quality  (i.e.,  how  good  are  the  measurements  obtained  with  these 
measures). 
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Another  major  limiting  factor  with  respect  to  the  use  of  organizational  level  outcome  measures  is  the  lack  of 
conceptual  and/or  empirical  formulations  specifying  the  potential  linkages  between  personnel  action  and  specific 
outcome  measures.  For  example,  if  the  goal  is  to  evaluate  the  impact  of  a  particular  training  program  with  respect 
to  organizational  outcomes  it  is  important  to  match  the  nature  and  content  of  the  training  with  specific  outcome 
measures  likely  to  be  influenced.  If  the  training  program  focuses  on  improving  maintenance  skills  then  measures 
most  directly  related  to  maintenance  outcomes  should  be  identified  and  examined.  Thus  while  there  may  be 
numerous  outcome  measures  available,  little  if  any  information  exists  pertaining  to  the  performance  relevance  of 
these  measures. 

In  summary,  while  organizational  level  outcome  measures  are  a  potentially  valuable  criterion  against  which  to 
evaluate  training  effectiveness,  several  factors  have  limited  the  utility  of  these  measures.  These  factors  include: 
a)  contamination  by  non-performance  related  factors;  b)  lack  of  a  common  measurement  metric;  c)  a  focus  on 
overall  level  rather  than  the  performance  process;  d)  lack  of  any  indication  of  measurement  quality;  and,  e)  no 
conceptual/empirical  formulations  of  the  linkage  between  specific  actions  and  outcomes.  Thus,  any  system  that 
attempts  to  include  outcome  measures  must  address  these  issues. 

Identification  of  Aircraft  Maintenance  Related  Measures  of  Performance 

One  of  the  primary  objectives  of  the  present  study  was  to  identify  and  examine  the  utility  of  aircraft  maintenance 
related  measures  of  performance  typically  collected  and  used  by  the  Air  Force.  Measures  of  performance 
(MOPs)  are  qualitative  or  quantitative  measures  of  system  capabilities  or  characteristics  (USAF/TEP,  1994). 
Toward  this  end,  several  sources  of  data  were  identified  through  interviews  with  supervisory  level  maintenance 
personnel.  One  source  of  such  data  was  a  combination  of  CAMS-  based  maintenance  data  and  unit  mission 
characteristic  data.  This  data  is  routinely  collected  and  reported  by  aircraft  maintenance  units  as  an  index  of 
mission  effectiveness.  This  data  takes  into  consideration  both  equipment  and  unit  mission  and  manpower 
characteristics.  Example  measures  include  fully  mission  capable  rate  (FMC),  man  hours  per  flying  hours,  air  and 
ground  abort  rates,  and  delayed  discrepancies  (deferred  maintenance  actions).  Although  a  valuable  source  of 
information  with  respect  to  mission  capability,  these  measures  illustrate  many  of  the  disadvantages  associated 
with  operational  measures.  More  specifically,  the  metric  of  each  measure  is  unique  to  the  characteristic  being 
measured.  Thus  data  is  difficult  to  combine  and  summarize  across  measures.  Further,  the  measures  are 
cumbersome  to  summarize.  While  the  measures  lend  themselves  to  typical  overall  summations  such  as  the  mean 
performance  level,  such  measures  of  central  tendency  only  provide  part  of  the  overall  picture.  Other  important 
information  includes  the  amount  of  fluctuation  and  the  percent  of  time  at  or  above  some  preset  standard. 

Despite  these  limitations,  these  indices  have  many  desirable  characteristics  with  respect  to  training  evaluation. 
These  characteristics  include: 

1.  The  measures  are  regularly  and  systematically  collected. 

2.  It  appears  that  these  indices  are  both  required  by  and  reported  to  MAJCOM.  Thus  it  is  likely  that  these 
measures  are  available  Air  Force  wide. 

3.  The  mission  capable/readiness  indices  reflect  both  equipment,  mission,  and  manpower  characteristics. 

4.  The  indices  are  easily  aggregated  from  the  individual  unit  level  to  higher  levels  of  the  organization  (wing, 
command,  etc.). 

5.  The  indices  reflect  multiple  measures  of  performance  within  a  specified  time  span  (iterated  job  function)  and 

thus  are  readily  amenable  to  the  DEAMR  system.  , 

Distributional  Approach  to  Criterion  Measurement 

A  second  objective  of  the  present  study  was  to  develop  and  evaluate  a  measurement  system  that  increases  the 
utility  of  regularly  collected  operational  measures  of  performance.  Toward  this  end  a  specific  measurement 
system  is  presented.  The  measurement  approach  presented  here  extends  the  system  for  assessing  individual 
performance  developed  by  Kane  (1986)  to  outcome  level  criteria  measurement.  It  is  believed  that  this  approach 
may  offer  a  partial  solution  to  the  problems  associated  with  outcome  measures.  The  original  system  presented  by 
Kane  (1986),  labeled  Performance  Distribution  Assessment  (PDA),  is  based  on  the  distributional  measurement 
model  postulated  by  Kane  and  Lawler  (1979).  An  important  characteristic  of  this  model  is  a  focus  on  the  range  of 
performance  observed.  Specifically,  the  model  stipulates  that  not  only  is  the  level  of  performance  important,  but 
the  fluctuation  or  variance  in  performance  must  also  be  considered.  For  example,  two  individuals  may  both  be 
appropriately  characterized  as  "average  performers";  however,  if  one  is  consistently  average  and  the  other 
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alternates  between  very  poor  and  very  good,  very  different  pictures  emerge  with  respect  to  the  individuals' 
performance.  Thus  performance  measurement  must  assess  the  range  of  performance  over  time.  Specifically, 
performance  is  defined  in  terms  of  the  outcomes  of  job  functions  that  are  carried  out  on  multiple  occasions 
within  a  specified  time  span  (i.e.,  iterated  job  functions).  Performance  can  subsequently  be  represented  in  terms 
of  the  frequency  at  which  various  outcome  levels  occurred  within  a  given  time  span. 

Another  important  characteristic  of  the  PDA  approach  is  that  it  incorporates  a  relativistic  scaling  of  performance 
information.  More  specifically,  performance  is  expressed  as  a  ratio  of  actual  performance  (as  reflected  in  the 
performance  distribution  generated)  to  a  maximum  feasible  performance  distribution.  This  maximum  feasible 
distribution  reflects  the  highest  level  of  performance  attainable  given  the  constraints  under  which  the  work 
occurs.  This  scaling  process  serves  to  express  performance  in  terms  of  a  relative  range  of  potential  performance. 
Thus,  the  method  allows  for  quantifiably  excluding  from  consideration  in  the  evaluation  of  performance  the 
range  of  performance  that  is  attributable  to  circumstances  beyond  the  performer's  control. 

The  representation  of  performance  in  distributional  form  along  with  relativistic  scaling  has  several  important 
advantages.  First,  it  allows  for  a  consideration  of  performance  variability  as  well  as  average  levels  of 
performance.  Thus  it  allows  for  an  assessment  of  the  consistency  of  performance  and  the  extent  to  which 
negatively  valued  outcomes  are  avoided.  In  this  way  more  information  is  provided  regarding  the  idiosyncratic 
nature  of  individual  performance.  Second,  the  relativistic  scaling  process  advocated  by  the  PDA  process 
produces  measures  of  the  effectiveness  of  performance  on  relativized  0-100%  scales  with  common  zero  and 
common  upper  limits  of  100%.  Thus  any  given  percentage  level  remains  constant  in  its  meaning  regardless  of  the 
job,  division,  or  even  the  organizational  level  in  which  it  occurs.  At  the  same  time,  the  particular  outcome 
measures  used  to  assess  performance  may  be  individualized  to  meet  situational  demands  and  organizational 
constraints.  Specifically,  if  positions  have  appreciably  different  content  and  extraneous-constraint  conditions, 
measures  can  be  scaled  to  account  for  these  differences. 

The  PDA  approach  was  originally  advocated  as  method  for  enhancing  performance  ratings.  Specifically,  it  was 
formulated  to  incorporate  subjective  estimates  of  individual  performance  outcome  frequencies  (i.e.,  supervisory 
ratings  of  the  frequency  at  which  individuals  performed  at  a  particular  level).  However,  its  focus  on  the 
frequency  of  particular  performance  outcomes  make  it  particularly  amenable  to  use  with  more  objective  outcome 
measures.  Thus,  the  application  of  this  methodology  to  the  measurement  of  organizational  outcomes  using 
iterative  operational  measures  appears  to  be  a  fruitful  avenue  for  research  and  may  serve  to  increase  the  utility  of 
these  measures  in  the  training  evaluation  process. 

Adaptation  of  the  PDA  Approach  for  Outcome  Level  Measures 

As  noted  above,  the  PDA  system  appears  to  be  well  suited  for  the  measurement  and  scaling  of  operational 
criterion  measures.  For  purposes  of  illustration.  Table  1  presents  hypothetical  evaluation  data  for  the  man  hour 
per  fly  hour  measure  presented  in  PDA  format.  In  Table  1  the  performance  range  represents  5  equidistant  steps 
between  the  highest  possible  performance  outcome  (listed  as  .30  in  the  Table)  and  the  lowest  performance  level 
(listed  as  28.70  in  the  Table).  These  values  are  based  on  an  actual  count  of  man  hour  per  fly  hour  measure  over 
the  course  of  1  fiscal.  The  utility  weights  represent  the  utility  or  value  to  the  organization  of  performance  at  each 
of  the  5  levels.  These  are  hypothetical  values  in  this  case  and  would  be  based  on  SME  estimates.  The  comparison 
level  values  represent  a  "benchmark"  distribution.  This  "benchmark"  distribution  may  represent  either  an 
estimated  ideal  distribution  of  performance  or  the  actual  performance  distribution  of  a  comparison  unit  (i.e.,  an 
earlier  time  frame  or  another  work  unit).  The  diagram  in  Table  1  shows  the  relationship  between  the  2 
performance  distributions  represented  by  the  actual  performance  values  and  the  comparison  level.  Based  on  this 
information  distributional  characteristics  for  the  actual  performance  values  are  presented  (these  characteristics 
may  be  expressed  in  either  the  utility  weight  metric  or  the  performance  level  scale).  The  total  performance 
effectiveness  score  represents  a  quantitative  index  in  a  percentage  index,  that  relates  the  actual  distribution  to  the 
comparison  distribution. 

This  revised  approach  to  the  performance  distribution  model  is  labeled  here  as  Distribution-Based  Evaluation 
and  Assessment  of  Mission  Readiness  (DEAMR).  This  approach  extends  the  beneficial  characteristics  of  relative 
distribution  based  performance  assessment  to  organization  level  outcome  measures. 

Table  1 :  Sample  Performance  Level  Frequencies  and  Distributional  Characteristics  for  the  Man  Hour  per  Fly 
Hour  Measure. 
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Performance  Frequencies 

Distribution  Characteristics 

Perf. 

Level 

Perf. 

Range 

Utility 

Weights 

Perf. 

Level 

Freq. 

Perf. 

Level 

% 

Comparison 
Level  % 

Perf. 

Level 

Scale 

1 

28.70 

0 

0 

0 

Mean  = 

38.46 

3.77 

2 

21.60 

1 

8 

1 

SD  = 

34.83 

0.70 

3 

14.50 

0 

2 

15 

10 

Skewness  = 

m 

D 

7.40 

50 

9 

69 

80 

Kurtosis  = 

1.19 

1.19 

5 

.30 

100 

1 

8 

9 

Negative 
Range  Score 

Total 

Obs.  — 

13 

Total  Perf. 
Effectiveness 

85.09 

More  specifically,  characteristics  of  the  DEAMR  process  include: 

1.  Performance  measurement  is  relativistic.  Outcome  measures  are  scaled  relative  to  maximum  possible  and 
minimum  acceptable  performance  levels.  Performance  distributions  are  relative  to  some  "benchmark" 
distribution.  Thus,  measurement  considers  the  extraneous  factors  that  may  influence  outcome  measures. 

2.  Performance  measurement  is  based  on  common  metric.  All  measures  are  expressed  in  terms  of  percentages 
and  thus  have  minimum  and  maximum  points. 

3.  Multiple  measures  of  performance  are  provided;  performance  is  described  in  terms  of  mean  level,  consistency, 
and  negative  range  avoidance.  These  multiple  measures  provide  more  information  about  the  nature  of 
performance  and  performance  problems. 

Another  important  characteristic  of  the  DEAMR  system  is  that  it  is  easily  automated.  Relatively  little  data  is 
required  in  order  to  calculate  the  distributional  parameters.  This  data  required  includes  the  highest  possible  and 
lowest  acceptable  performance  level,  an  estimate  of  the  utility  weight  associated  with  the  lowest  acceptable 
performance  level,  and  the  actual  frequency  of  performance  outcomes  at  each  of  the  performance  levels.  This 
data  may  then  simply  be  input  into  computer  or  spreadsheet  based  programs  to  calculate  the  distributional 
characteristics. 
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